aws_route53_record: add create,update,delete timeout options for the resource #11895

abhinavdahiya · 2020-02-04T22:54:08Z

Community Note

Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

ChangeRecordSets now uses the AWS SDKs inbuilt retry method instead of StateConf change retry loop for 5 minutes.

Release note for CHANGELOG:

* ChangeRecordSets now uses the AWS SDKs inbuilt retry method instead of StateConf change retry loop for 5 minutes.

Output from acceptance testing:

$ make testacc TESTARGS='-run=TestAccXXX'

...

Closes #11896

bflad

Hi @abhinavdahiya 👋 Thank you for submitting this.

We have found in practice that generally when folks are asking for customizable timeouts, that usually there is actually another bug or change needed in the resource logic. This is usually due to how Terraform Plugin SDK helpers expect certain operations to complete (usually immediately) and how the AWS Go SDK causes operations to complete (potentially automatically retrying for extended periods of time without returning during throttling), but also sometimes due to the Terraform AWS Provider needing to tell the AWS Go SDK to automatically retry certain API responses that are not already retried automatically.

We went through a large effort to fix a class of these in #7873. This covered resource.Retry() cases, but did not cover (resource.StateChangeConf).WaitForState() since we had received less of these reports in practice and the fix implementation is markedly different. The fix in those cases was to retry one last time without time bounding so operators do not need to manually figure the appropriate timeframe out in their environment and can instead rely on the AWS Go SDK max retries (still configurable, but by default retries for roughly an hour on throttling).

In this specific case though, we instead can add the PriorRequestNotComplete error code as a retryable error code for the ChangeResourceRecordSets operation in aws/config.go. An example of how this was done for a similar issue with a different resource can be found here:

https://github.com/terraform-providers/terraform-provider-aws/blob/3c84f53f15b4e4d4b9cc744478d37c97e9b9166e/aws/config.go#L591-L597

Something like the below should get us close:

	client.r53conn.Handlers.Retry.PushBack(func(r *request.Request) {
		if r.Operation.Name == "ChangeResourceRecordSets" {
			if isAWSErr(r.Error, route53.ErrCodePriorRequestNotComplete, "") {
				r.Retryable = aws.Bool(true)
			}
		}
	})

Then we can remove the changeRoute53RecordSet usage of StateChangeConf and associated 5 minute timeout completely. If you can change this implementation to the above, we can get this in to hopefully solve your (and others) problem without requiring operator intervention. Thanks!

abhinavdahiya · 2020-02-07T19:37:45Z

Hi @bflad ! thanks for the detailed response and the alternative implementation suggestion..

When i started looking into the adding PriorRequestNotComplete as retryable error, i found out that it was already part of the throttle code set in SDK.. https://github.com/aws/aws-sdk-go/blame/3acad1210f5c4feb4866ff9874b5f3f5895bd96d/aws/request/retryer.go#L93 and has been for some time now.

So the logic from #7873 seemed like a better way to solve the timeout issue. Therefore i updated the functions to use that workflow, allows aws eventual consistency to kick in and then depend on SDK's internal backoff for PriorRequestNotComplete.

PTAL.

bflad

Thanks for those updates, @abhinavdahiya 😄 Two more minor things and this should be good to go. 👍

bflad · 2020-02-07T19:48:01Z

aws/resource_aws_route53_record.go

+	var out *route53.ChangeResourceRecordSetsOutput
+	err := resource.Retry(1*time.Minute, func() *resource.RetryError {
+		var err error
+		out, err = conn.ChangeResourceRecordSets(input)
+		if isAWSErr(err, "InvalidChangeBatch", "") {
+			// This means the record is already gone
+			return nil
+		}
+		return resource.NonRetryableError(err)
+	})
+	if isResourceTimeoutError(err) {
+		out, err = conn.ChangeResourceRecordSets(input)
+		if isAWSErr(err, "InvalidChangeBatch", "") {
+			// This means the record is already gone
+			return out, nil
+		}


This can be simplified without resource.Retry() at all since there are no retryable conditions 👍

out, err := conn.ChangeResourceRecordSets(input) if isAWSErr(err, "InvalidChangeBatch", "") { return out, nil } return out, err

See also: #11864

@bflad

+1

fixed with f5cf401

bflad · 2020-02-07T19:50:43Z

aws/resource_aws_route53_record.go

+			log.Print("[DEBUG] Hosted Zone not found, retrying...")
+			return resource.RetryableError(err)
+		}
+		return resource.NonRetryableError(err)


Nit: In the future we will likely require the usage of resource.NonRetryableError()/resource.RetryableError() with non-nil errors:

Suggested change

return resource.NonRetryableError(err)

if err != nil {

return resource.NonRetryableError(err)

}

return nil

See also: hashicorp/terraform-plugin-sdk#199

+1
fixed with f5cf401

…DK backoff `PriorRequestNotComplete` is already part of the list of throttle codes [1], so if we use the SDK inbuilt backoff we shouldn't need custom state change logic that retries for 5 minutes. So this changes the changeRoute53RecordSet to resource.Retry and last retryOnTimeout from issue 7873 [2], such that we send request in resource.Retry for 1 minute and if there is a timeout, we send another request that then uses the AWS SDK default retry logic... while the deleteRoute53RecordSet now directly makes the request using SDK without any external retries. [1]: https://github.com/aws/aws-sdk-go/blob/3acad1210f5c4feb4866ff9874b5f3f5895bd96d/aws/request/retryer.go#L93 [2]: hashicorp#7873

bflad

Looks great, thanks @abhinavdahiya 🚀

Output from acceptance testing:

--- PASS: TestAccAWSRoute53Record_disappears (138.04s)
--- PASS: TestAccAWSRoute53Record_Alias_S3 (140.04s)
--- PASS: TestAccAWSRoute53Record_txtSupport (144.45s)
--- PASS: TestAccAWSRoute53Record_caaSupport (157.92s)
--- PASS: TestAccAWSRoute53Record_generatesSuffix (160.83s)
--- PASS: TestAccAWSRoute53Record_Alias_Elb (166.20s)
--- PASS: TestAccAWSRoute53Record_underscored (169.23s)
--- PASS: TestAccAWSRoute53Record_Alias_Uppercase (178.33s)
--- PASS: TestAccAWSRoute53Record_basic_fqdn (184.78s)
--- PASS: TestAccAWSRoute53Record_geolocation_basic (184.91s)
--- PASS: TestAccAWSRoute53Record_failover (192.05s)
--- PASS: TestAccAWSRoute53Record_weighted_basic (197.08s)
--- PASS: TestAccAWSRoute53Record_latency_basic (197.18s)
--- PASS: TestAccAWSRoute53Record_weighted_to_simple_basic (200.10s)
--- PASS: TestAccAWSRoute53Record_spfSupport (202.82s)
--- PASS: TestAccAWSRoute53Record_basic (212.78s)
--- PASS: TestAccAWSRoute53Record_wildcard (223.55s)
--- PASS: TestAccAWSRoute53Record_disappears_MultipleRecords (271.22s)
--- PASS: TestAccAWSRoute53Record_empty (119.67s)
--- PASS: TestAccAWSRoute53Record_longTXTrecord (122.34s)
--- PASS: TestAccAWSRoute53Record_multivalue_answer_basic (126.24s)
--- PASS: TestAccAWSRoute53Record_AliasChange (159.71s)
--- PASS: TestAccAWSRoute53Record_TypeChange (167.90s)
--- PASS: TestAccAWSRoute53Record_SetIdentifierChange (168.08s)
--- PASS: TestAccAWSRoute53Record_allowOverwrite (159.12s)
--- PASS: TestAccAWSRoute53Record_weighted_alias (351.02s)
--- PASS: TestAccAWSRoute53Record_Alias_VpcEndpoint (485.16s)

ghost · 2020-02-14T15:47:44Z

This has been released in version 2.49.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

brings in fix for aws_record timeout [1] and other fixes mentioned in CHANGELOG [2] updated using ``` go get github.com/terraform-providers/terraform-provider-aws@f0f304894df67616dfbd675bc9687a7db266ad41 ``` Using the tag failed with error ``` go get github.com/terraform-providers/terraform-provider-aws@v2.49.0 go: finding github.com/terraform-providers/terraform-provider-aws v2.49.0 go: finding github.com/terraform-providers/terraform-provider-aws v2.49.0 go get github.com/terraform-providers/terraform-provider-aws@v2.49.0: github.com/terraform-providers/terraform-provider-aws@v2.49.0: invalid version: module contains a go.mod file, so major version must be compatible: should be v0 or v1, not v2 ``` Also removes the indirect dependecy on `aws-iam-authenticator` because of [3] [1]: hashicorp/terraform-provider-aws#11895 [2]: https://github.com/terraform-providers/terraform-provider-aws/blob/v2.49.0/CHANGELOG.md#2490-february-14-2020 [3]: hashicorp/terraform-provider-aws#11822

ghost · 2020-03-27T17:15:40Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

abhinavdahiya requested a review from a team February 4, 2020 22:54

bflad added the proposal Proposes new design or functionality. label Feb 5, 2020

bflad suggested changes Feb 5, 2020

View reviewed changes

bflad self-assigned this Feb 5, 2020

bflad added waiting-response Maintainers are waiting on response from community or contributor. and removed needs-triage Waiting for first response or review from a maintainer. labels Feb 5, 2020

abhinavdahiya force-pushed the aws_record_timeout branch from 66aa881 to 82afc9b Compare February 7, 2020 19:33

ghost added size/M Managed by automation to categorize the size of a PR. and removed size/S Managed by automation to categorize the size of a PR. labels Feb 7, 2020

ghost removed the waiting-response Maintainers are waiting on response from community or contributor. label Feb 7, 2020

bflad suggested changes Feb 7, 2020

View reviewed changes

bflad removed the proposal Proposes new design or functionality. label Feb 7, 2020

abhinavdahiya force-pushed the aws_record_timeout branch from 82afc9b to f5cf401 Compare February 7, 2020 21:47

bflad added this to the v2.49.0 milestone Feb 10, 2020

bflad approved these changes Feb 10, 2020

View reviewed changes

bflad added the bug Addresses a defect in current functionality. label Feb 10, 2020

bflad merged commit fd754cb into hashicorp:master Feb 10, 2020

bflad added a commit that referenced this pull request Feb 10, 2020

Update CHANGELOG for #11895

dced906

abhinavdahiya deleted the aws_record_timeout branch February 10, 2020 23:51

ghost locked and limited conversation to collaborators Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws_route53_record: add create,update,delete timeout options for the resource #11895

aws_route53_record: add create,update,delete timeout options for the resource #11895

abhinavdahiya commented Feb 4, 2020 •

edited

Loading

bflad left a comment

abhinavdahiya commented Feb 7, 2020

bflad left a comment

bflad Feb 7, 2020

abhinavdahiya Feb 7, 2020

bflad Feb 7, 2020

abhinavdahiya Feb 7, 2020

bflad left a comment

ghost commented Feb 14, 2020

ghost commented Mar 27, 2020

aws_route53_record: add create,update,delete timeout options for the resource #11895

aws_route53_record: add create,update,delete timeout options for the resource #11895

Conversation

abhinavdahiya commented Feb 4, 2020 • edited Loading

Community Note

bflad left a comment

Choose a reason for hiding this comment

abhinavdahiya commented Feb 7, 2020

bflad left a comment

Choose a reason for hiding this comment

bflad Feb 7, 2020

Choose a reason for hiding this comment

abhinavdahiya Feb 7, 2020

Choose a reason for hiding this comment

bflad Feb 7, 2020

Choose a reason for hiding this comment

abhinavdahiya Feb 7, 2020

Choose a reason for hiding this comment

bflad left a comment

Choose a reason for hiding this comment

ghost commented Feb 14, 2020

ghost commented Mar 27, 2020

abhinavdahiya commented Feb 4, 2020 •

edited

Loading